7 research outputs found

    An MPI-IO In-Memory driver for non-volatile pooled memory of the Kove XPD

    Get PDF
    Many scientific applications are limited by the performance offered by parallel file systems. SSD based burst buffers provide significant better performance than HDD backed storage but at the expense of capacity. Clearly, achieving wire-speed of the interconnect and predictable low latency I/O is the holy grail of storage. In-memory storage promises to provide optimal performance exceeding SSD based solutions. Kove®’s XPD® offers pooled memory for cluster systems. This remote memory is asynchronously backed up to storage devices of the XPDs and considered to be non-volatile. Albeit the system offers various APIs to access this memory such as treating it as a block device, it does not allow to expose it as file system that offers POSIX or MPI-IO semantics. In this paper, we (1) describe the XPD-MPIIO-driver which supports the scale-out architecture of the XPDs. This MPI-agnostic driver enables high-level libraries to utilize the XPD’s memory as storage. (2) A thorough performance evaluation of the XPD is conducted. This includes scale-out testing of the infrastructure and “metadata” operations but also performance variability. We show that the driver and storage architecture is able to nearly saturate wire-speed of Infiniband (60+ GiB/s with 14FDR links) while providing low latency and little performance variability

    Benefit of DDN's IME-FUSE for I/O intensive HPC applications

    Get PDF
    Many scientific applications are limited by I/O performance offered by parallel file systems on conventional storage systems. Flash- based burst buffers provide significant better performance than HDD backed storage, but at the expense of capacity. Burst buffers are consid- ered as the next step towards achieving wire-speed of interconnect and providing more predictable low latency I/O, which are the holy grail of storage. A critical evaluation of storage technology is mandatory as there is no long-term experience with performance behavior for particular applica- tions scenarios. The evaluation enables data centers choosing the right products and system architects the integration in HPC architectures. This paper investigates the native performance of DDN-IME, a flash- based burst buffer solution. Then, it takes a closer look at the IME-FUSE file systems, which uses IMEs as burst buffer and a Lustre file system as back-end. Finally, by utilizing a NetCDF benchmark, it estimates the performance benefit for climate applications

    Predicting performance of non-contiguous I/O with machine learning

    Get PDF
    Data sieving in ROMIO promises to optimize individual non-contiguous I/O. However, making the right choice and parameterizing its buffer size accordingly are non-trivial tasks, since predicting the resulting performance is difficult. Since many performance factors are not taken into account by data sieving, extracting the optimal performance for a given access pattern and system is often not possible. Additionally, in Lustre, settings such as the stripe size and number of servers are tunable, yet again, identifying rules for the data-centre proves challenging indeed. In this paper, we (1) discuss limitations of data sieving, (2) apply machine learning techniques to build a performance predictor, and (3) learn and extract best practices for the settings from the data. We used decision trees as these models can capture non-linear behavior, are easy to understand and allow for extraction of the rules used. Even though this initial research is based on decision trees, with sparse training data, the algorithm can predict many cases sufficiently. Compared to a standard setting, the decision trees created are able to improve performance significantly and we can derive expert knowledge by extracting rules from the learned tree. Applying the scheme to a set of experimental data improved the average throughput by 25–50 % of the best parametrization’s gain. Additionally, we demonstrate the versatility of this approach by applying it to the porting system of DKRZ’s next generation supercomputer and discuss achievable performance gains

    Towards decoupling the selection of compression algorithms from quality constraints – an investigation of lossy compression efficiency

    Get PDF
    Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves information accurately but lossy data compression can achieve much higher compression rates depending on the tolerable error margins. There are many ways of defining precision and to exploit this knowledge, therefore, the field of lossy compression is subject to active research. From the perspective of a scientist, the qualitative definition about the implied loss of data precision should only matter. With the Scientific Compression Library (SCIL), we are developing a meta-compressor that allows users to define various quantities for acceptable error and expected performance behavior. The library then picks a suitable chain of algorithms yielding the user’s requirements, the ongoing work is a preliminary stage for the design of an adaptive selector. This approach is a crucial step towards a scientifically safe use of much-needed lossy data compression, because it disentangles the tasks of determining scientific characteristics of tolerable noise, from the task of determining an optimal compression strategy. Future algorithms can be used without changing application code. In this paper, we evaluate various lossy compression algorithms for compressing different scientific datasets (Isabel, ECHAM6), and focus on the analysis of synthetically created data that serves as blueprint for many observed datasets. We also briefly describe the available quantitiesof SCIL to define data precision and introduce two efficient compression algorithms for individualdata points. This shows that the best algorithm depends on user settings and data properties

    Survey of storage systems for high-performance computing

    Get PDF
    In current supercomputers, storage is typically provided by parallel distributed file systems for hot data and tape archives for cold data. These file systems are often compatible with local file systems due to their use of the POSIX interface and semantics, which eases development and debugging because applications can easily run both on workstations and supercomputers. There is a wide variety of file systems to choose from, each tuned for different use cases and implementing different optimizations. However, the overall application performance is often held back by I/O bottlenecks due to insufficient performance of file systems or I/O libraries for highly parallel workloads. Performance problems are dealt with using novel storage hardware technologies as well as alternative I/O semantics and interfaces. These approaches have to be integrated into the storage stack seamlessly to make them convenient to use. Upcoming storage systems abandon the traditional POSIX interface and semantics in favor of alternative concepts such as object and key-value storage; moreover, they heavily rely on technologies such as NVM and burst buffers to improve performance. Additional tiers of storage hardware will increase the importance of hierarchical storage management. Many of these changes will be disruptive and require application developers to rethink their approaches to data management and I/O. A thorough understanding of today's storage infrastructures, including their strengths and weaknesses, is crucially important for designing and implementing scalable storage systems suitable for demands of exascale computing

    State of the Art and Future Trends in Data Reduction for High-Performance Computing

    No full text
    Research into data reduction techniques has gained popularity in recent years as storage capacity and performance become a growing concern. This survey paper provides an overview of leveraging points found in high-performance computing (HPC) systems and suitable mechanisms to reduce data volumes. We present the underlying theories and their application throughout the HPC stack and also discuss related hardware acceleration and reduction approaches. After introducing relevant use-cases, an overview of modern lossless and lossy compression algorithms and their respective usage at the application and file system layer is given. In anticipation of their increasing relevance for adaptive and in situ approaches, dimensionality reduction techniques are summarized with a focus on non-linear feature extraction. Adaptive approaches and in situ compression algorithms and frameworks follow. The key stages and new opportunities to deduplication are covered next. An unconventional but promising method is recomputation, which is proposed at last. We conclude the survey with an outlook on future developments
    corecore